Viterbi decoding for latent words language models using gibbs sampling

نویسندگان

Ryo Masumura

Hirokazu Masataki

Takanobu Oba

Osamu Yoshioka

Satoshi Takahashi

چکیده

This paper introduces a new approach that directly uses latent words language models (LWLMs) in automatic speech recognition (ASR). LWLMs are effective against data sparseness because of their soft-decision clustering structure and Bayesian modeling so it can be expected that LWLMs perform robustly in multiple ASR tasks. Unfortunately, implementing a LWLM to ASR is difficult because of its computation complexity. In our previous work, we implemented an approximate LWLM for ASR by sampling words according to a stochastic process and training a word n-gram LMs. However, the previous approach cannot take into account the latent variable sequence behind the recognition hypothesis. To solve this problem, we propose a method based on Viterbi decoding that simultaneously decodes the recognition hypothesis and its latent variable sequence. In the proposed method, we use Gibbs sampling for rapid decoding. Our experiments show the effectiveness of the proposed Viterbi decoding based on n-best rescoring. Moreover, we also investigate the effects on the combination of the previous approximate LWLM and the proposed Viterbi decoding.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Decoding Running Key Ciphers

There has been recent interest in the problem of decoding letter substitution ciphers using techniques inspired by natural language processing. We consider a different type of classical encoding scheme known as the running key cipher, and propose a search solution using Gibbs sampling with a word language model. We evaluate our method on synthetic ciphertexts of different lengths, and find that...

متن کامل

Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling

Most current statistical natural language processing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sampling, a simple Monte Carlo method used to perform approximate inference in factored probabilistic model...

متن کامل

The Analysis of Bayesian Probit Regression of Binary and Polychotomous Response Data

The goal of this study is to introduce a statistical method regarding the analysis of specific latent data for regression analysis of the discrete data and to build a relation between a probit regression model (related to the discrete response) and normal linear regression model (related to the latent data of continuous response). This method provides precise inferences on binary and multinomia...

متن کامل

A Latent Dirichlet Framework for Relevance Modeling

Relevance-based language models operate by estimating the probabilities of observing words in documents relevant (or pseudo relevant) to a topic. However, these models assume that if a document is relevant to a topic, then all tokens in the document are relevant to that topic. This could limit model robustness and effectiveness. In this study, we propose a Latent Dirichlet relevance model, whic...

متن کامل

Latent Variable Perceptron Algorithm for Structured Classification

We propose a perceptron-style algorithm for fast discriminative training of structured latent variable model, and analyzed its convergence properties. Our method extends the perceptron algorithm for the learning task with latent dependencies, which may not be captured by traditional models. It relies on Viterbi decoding over latent variables, combined with simple additive updates. Compared to e...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Viterbi decoding for latent words language models using gibbs sampling

نویسندگان

چکیده

منابع مشابه

Decoding Running Key Ciphers

Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling

The Analysis of Bayesian Probit Regression of Binary and Polychotomous Response Data

A Latent Dirichlet Framework for Relevance Modeling

Latent Variable Perceptron Algorithm for Structured Classification

عنوان ژورنال:

اشتراک گذاری